TITLE OF THE INVENTION 
SPEECH SYNTHESIS APPARATUS, CONTROL METHOD THEREFOR, 
AND COMPUTER- READABLE MEMORY 

5 BACKGROUND OF THE INVENTION 

The present invention relates to a speech 
synthesis apparatus for performing speech synthesis by 
using pitch marks, a control method for the apparatus, 
and a computer-readable memory. 
10 Conventionally, processing that synchronizes with 

pitches has been performed as speech analysis /synthesis 

10 

processing and the like. For example, in a PSOLA (Pitch 

fU 

* Synchronous Over Lap Adding) speech synthesis method, 

ly synthetic speech is obtained by adding one-pitch speech 

Iff 15 waveform element pieces in synchronism with pitches. 

j^C^> In this scheme, information (pitch mark) about the 

Pci\ ■ \ 

position of each pitch must be recorded concurrently 

with storage of speech waveform data. 

In the prior art described above, however, the 
20 size of a file on which pitch marks are recorded becomes 
undesirably large. 

SUMMARY OF THE INVENTION 
The present invention has been made in 
25 consideration of the above problem, and has as its 

object to provide a speech synthesis apparatus capable 
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of reducing the size of a file used to manage pitch 
marks, a control method therefor, and a computer- 
readable memory. 

In order to achieve the above object, a speech 
5 synthesis apparatus according to the present invention 
has the following arrangement. 

There is provided a speech synthesis apparatus for 
performing speech synthesis by using pitch marks, 
comprising : 

10 ^f^rst calculatior^means for calculating a distance 

between first two pitch marks of a voiced portion of 
speech data to be processed"^ 
* /(^f^ ^econd calculation means for calculating a 

difference between adjacent inter-pitch-mark distances ; 
15 and 

management means for storing the calculation 
results obtained by the first and second calculation 
means in a file and managing the results. 

In order to achieve the above object, a speech 
20 synthesis apparatus according to the present invention 
has the following arrangement. 

There is provided a speech synthesis apparatus for 
performing speech synthesis by using pitch marks, 
comprising : 

25 first comparison means for, when a length of 

speech data to be processed is represented by d, and a 
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maximum value dmax and a minimum value dmin are defined 
for a predetermined word length, comparing the length d 
with the maximum value dmax; 

second comparison means for comparing the length d 
5 with the minimum value dmin on the basis of the 

comparison result obtained by the first comparing means; 
/^j^6^ extraction me^ns for subtracting the maximum 

value dmax or minimum Value dmin from the length d on 
'3 the basis of the comparison results obtained by the 

10 first and second comparison means; and 
[U management means for storing the difference 

In obtained by the subtraction means or the length d in the 

■s file and managing the difference or the length on the 

u\ basis of the comparison results obtained by the first 

jiS 15 and second comparison means. 

% In order to achieve the above object, a speech 

synthesis apparatus according to the present invention 
has the following arrangement. 

There is provided a speech synthesis apparatus for 
20 performing speech synthesis by using pitch marks, 
comprising : 

1^7 storage means ft^r storing a file for managing a 

distance between first uwo pitch marks of a voiced 
portion of speech data to be processed and a difference 
25 between adjacent inter-pi tch-ihark distances; 

first loading means for loading the distance 
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between the first two pitch marks of the voiced portion; 

second loading means for loading the difference 
between the adjacent inter-pitch-mark distances; and 
t calculation \means for calculating a next pitch 

5 mark position from pitch mark position calculated 
immediately before thk calculation, a pitch mark 
distance to an ad jacentV pitch mark, and the distance and 
difference loaded by the Virst and second loading means. 
O In order to achieve the above object, a control 

10 method for a speech synthesis apparatus according to the 
present invention has the following steps. 

There is provided a control method for a speech 
synthesis apparatus for performing speech synthesis by 
using pitch marks, comprising: 



iioA^ 15 f irst cal V lation step of calculating a 

distance between f irsk two pitch marks of a voiced 
portion of speech data Vo be processed; 

the second calculation step of calculating a 

\ 



QZrt> difference between adjacent inter-pitch-mark distances; 



20 and 



the management step of storing the calculation 
results obtained in thfe first and second calculation 
steps in a file and managing the results. 

In order to achieve the above object, a control 
25 method for a speech synthesis apparatus according to the 
present invention has the following steps. 
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There is provided a control method for a speech 

synthesis apparatus for performing speech synthesis by 

using pitch marks, comprising: 

^ffc the first comparison step of, when a length of 

\M\P 'COO \ 

^ CAw * 5 speech data to be processed is represented by d, and a 

maximum value dmax and A minimum value dmin are defined 
for a predetermined word length, comparing the length d 
with the maximum value dmax; 
p£ ^ second comparison step of comparing the length 

10 d with the minimum value\dmin on the basis of the 

comparison result obtained\in the first comparing step; 
the subtraction step of subtracting the maximum 




CO value dmax or minimufia value dmin from the length d on 

the basis of the comparison results obtained in the 
15 first and second comparison steps; and 



^ the management step of storing the difference 

'QK^ obtain^^n the subtraction step or the length d in the 

file and managing the difference or the length on the 
basis of the comparison results obtained in the first 
20 and second comparison steps 

In order to achieve the above object, a control 
method for a speech synthesis apparatus according to the 
present invention has the following steps. 

There is provided a control method for a speech 
25 synthesis apparatus for performing speech synthesis by 
using pitch marks, comprising: 
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tt^storage step V storing a file for managing a 
distance between first tw\ pitch marks of a voiced 
portion of speech data to b\ processed and a difference 
between adjacent inter-pi tch-rJyark distances ; 

the^ first loading step of loading the distance 

o& \ 

between the first two^yitch marks of the voiced portion; 

t£e secoriS loading step of loading the difference 
between the adjacent inter-pitch-mark distances; and 

the^calcuPs^tion step of calculating a next pitch 
mark position f rom ^spitch mark position calculated 
immediately before the capsulation, a pitch mark 
distance to an adjacent pitch mltsj^ and the distance and 
difference loaded in the first and ^eb^nd loading steps. 

In order to achieve the above object, a computer- 
readable memory according to the present invention has 
the following program codes. 

There is provided a computer -readable memory 
storing program codes for controlling a speech synthesis 
apparatus for performing speech synthesis by using pitch 
marks, comprising: 

a program code f\r the first calculation step of 

n ^ \ 

calculating a distance between first two pitch marks of 
a voiced portion 



of speech\< 



data to be processed; 
^program codeNfor the second calculation step of 
calculating a difference between adjacent inter-pitch- 
mark distances; and \ 
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a program code for the management step of storing 
the calculation results obtained in the first and second 
calculation steps in a file and managing the results. 

In order to achieve the above object, a computer- 
5 readable memory according to the present invention has 
the following program codes. 

There is provided a computer-readable memory 
storing program codes for controlling a speech synthesis 
apparatus for performing speech synthesis by using pitch 
10 marks, comprising: 

q ^program code f^r the first comparison step of, 

^ when a length of speech alata to be processed is 

represented by d, and a maximum value dmax and a minimum 
value dmin are defined for a predetermined word length, 
15 comparing the length d with tlae maximum value dmax; 

a program code for the second comparison step of 
comparing the length d with the minimum value dmin on 
the basis of the comparison result obtained in the first 
comparing step; 
/a(\/ 20 a^program code Vor the subtraction step of 

subtracting the maximum Value dmax or minimum value dmin 
from the length d on the oasis of the comparison results 
obtained in the first and sefcond comparison steps; and 
a program code for the management step of storing 
25 the difference obtained in the subtraction step or the 
length d in the file and managing the difference or the 
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length on the basis of the comparison results obtained 
in the first and second comparison steps. 

In order to achieve the above object, a computer- 
readable memory according to the present invention has 
5 the following program codes. 

There is provided a computer-readable memory 
storing program codes for controlling a speech synthesis 
apparatus for performing speech synthesis by using pitch 
marks, comprising: 

10 a program coche for the storage step of storing a 

file for managing a distance between first two pitch 
marks of a voiced portion of speech data to be processed 
and a difference between ^adjacent inter-pitch-mark 
distances; \ 

15 a program code for the first loading step of 

loading the distance between the first two pitch marks 
of the voiced portion; 

a program code for the second loading step of 
loading the difference between the adjacent inter-pitch- 

20 mark distances; and 

a program code for the calculation step of 
calculating a next pitch mark position from a pitch mark 
position calculated immediately before the calculation, 
a pitch mark distance to an adjacent pitch mark, and the 

25 distance and difference loaded in the first and second 
loading steps . 
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Other features and advantages of the present 
invention will be apparent from the following 
description taken in conjunction with the accompanying 
drawings, in which like reference characters designate 
5 the same or similar parts throughout the figures thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a block diagram showing the arrangement 
of a speech synthesis apparatus according to the first 
10 embodiment of the present invention; 



Fig. 2 is a ft^ow chart showing pitch mark data 
:ile generati 



file generation processing executed in the first 
embodiment of the preseVt invention; 

Fig. 3 is a view for explaining pitch marks in the 
15 first embodiment of the present invention; 

Fig. 4 is a flow chart showing another example of 
the pitch mark data file generation processing executed 
in the first embodiment of the present invention; 

Fig. 5 is a flow chart showing another example of 
20 the processing of recording the pitch marks of a voiced 
portion in the first embodiment of the present 
invention; 

(V^^ So&bk 6 iS a flo \ chart showin g pitch mark data 

^ file loading proces s in Aexecut ed in the second 
25 embodiment of the present^ invention; and 

Fig. 7 is a flow chart showing another example of 
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the processing of loading the pitch marks of a voiced 
portion in the second embodiment of the present 
invention. 



5 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[First Embodiment] 

Fig. 1 is a block diagram showing the arrangement 
of a speech synthesis apparatus according to the first 
embodiment of the present invention. 
10 ^e^erence J^umeral 103 denotes a CPU for performing 

numerical operatioWcontrol , control on the respective 
components of the apparatus, and the like, which are 
executed in the preserit invention; 102, a RAM serving as 
a work area for processing executed in the present 
15 invention, a temporary saVlng area for various data and 
having an area for storing pitch mark data file 101a; 
101, a ROM storing various cdntrol programs such as 
programs executed in the present invention, for managing 
pitch mark data used for speech Vynthes is ; 109, an 
20 external storage unit serving as cOa area for storing 

processed data; and 105, a D/A converter for converting 
the digital speech data synthesized b\ the speech 
synthesis apparatus into analog speech \ata and 
outputting it from a loudspeaker 110 
Slff^ 5 Rg|erence numeral 106 denotes a display control 

^\y~ unit for controlling a display 111 when the processing 
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state and processing results of the speech synthesis 
apparatus, and a uster interface are to be displayed; 107, 
an input control uniX for recognizing key information 
input from a keyboard \l2 and executing the designated 
5 processing; 108, a communication control unit for 

controlling transmission/deception of data through a 
communication network 113; \and 104, a bus for connecting 
the respective components of\:he speech synthesis 
apparatus to each other. 
10 P^I^]^ mark data\f ile generation processing 

executed in the first embodiment will be described next 
with reference to Fig. 2. 

Fig. 2 is as^flow chart showing pitch mark data 
file generation processing executed in the first 
15 embodiment of the present invention. 

As shown in Fig. 3, pitch marks p x , p 2 , . . . , p., p. +1 
are arranged in each voiced portion at certain intervals, 
but no pitch mark is present in any unvoiced portion. 

First of all, it is checked in step SI whether the 
20 first segment of speech data to be processed is a voiced 
or unvoiced portion. If it is determined that the first 
segment is a voiced portion (YES in step SI) , the flow 
advances to step S2 . If it is determined that the first 
segment is an unvoiced portion (NO in step SI) , the flow 
25 advances to step S3 . 

In step S2 , voiced portion start information 
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indicating that "the first segment is a voiced portion" 
is recorded. In step S4 , a first inter-pitch-mark 
distance (distance between the first pitch mark p x and 
the second pitch mark p 2 of the voiced portion) d x is 
5 recorded in the pitch mark data file 101a. In step S5, 
the value of a loop counter i is initialized to 2. 

It is then checked in step S6 whether the voiced 
portion ends with the ith pitch mark p. indicated by the 
value of the loop counter i. If it is determined that 

10 the voiced portion does not end with the pitch mark p ± 

(NO in step S6) , the flow advances to step S7 to obtain 
the difference (d. - d^) between an inter-pitch-mark 
distance d ± and an inter-pitch-mark distance d^ . In 
step S8, the obtained difference - d.^) is recorded 

15 in the pitch mark data file 101a. In step S9, the loop 
counter i is incremented by 1, and the flow returns to 
step S6 . 

tiP If it is deternmaed that the voiced portion ends 

p~ (YES in step S6) , the floVadvances to step S10 to 

20 record a voiced portion end ^gnal indicating the end of 
the voiced portion in the pitch \ark data file 101a. 
Note that any signal can be used as\the voiced portion 
end signal as long as it can be discriminated from an 
inter-pitch-mark distance. In step Sll, oNt is checked 
25 whether the speech data has ended. If it is^determined 
that the speech data has not ended (NO in step S0.1) , the 
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flow advances to step S12 . If it is determined that the 
speech data has ended (YES in step Sll) , the processing 
is terminated . 

It is determined in step SI that the first segment 
or the speech datta is an unvoiced portion (NO in step 
SI) , the flow advances to step S3 to record unvoiced 
portion start information indicating that "the first 
segment is an unvoiced\portion" in the pitch mark data 
file 101a. In step S12,\a distance d s between the voiced 
portion and the next voiced portion (i.e., the length of 
the unvoiced portion) is recorded in the pitch mark data 
file 101a. In step S13, it isVhecked whether the speech 
data has ended. If it is determined that the speech data 
has not ended (NO in step S13), thXflow advances to 
step S4. If it is determined that thk speech data has 
ended (YES in step S13), the processing \is terminated. 

A^j^escribed abh<ve, according to the first 
embodiment, since the respective pitch marks in each 
voiced portion are managed V using the distances 
between the adjacent pitch mark^, all the pitch marks in 
each voiced portion need not be mahaged. This can reduce 
the size of the pitch mark data file \01a. 
K?^^ /^f^^ first embodiiKent, step S10 may be replaced 

with step S14 of counting tWe number (n) of pitch marks 
in each voiced portion and stfep S15 of recording the 
counted number n of pitch marks\in the pitch mark data 
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file 101a, as shown in Fig. 4. In this case, the 
processing in step S6 amounts to checking whether the 
value of the loop counter i is equal to the number n of 
pitch marks. 

5 Another example of the processing of recording 

pitch marks of each voiced portion in the first 
embodiment will be described with reference to Fig. 5. 

Fig. 5 is a flow chart showing another example of 
the processing of recording pitch marks of each voiced 
10 portion in the first embodiment of the present invention, 
For example, the data length of speech data to be 
processed is represented by d, and a maximum value dmax 
(e.g., 127) and a minimum value dmin (e.g., -127) are 
defined for a given word length (e.g., 8 bits). 
, 15 First of all\in step S16, d is compared with dmax. 

Qp>^ If d is equal to or larger than dmax (YES in step SI 6) , 

the flow advances to step S17 to record the maximum 
value dmax in the pitch ma^k data file 101a. In step S18, 
dmax is subtracted from d, arid the flow returns to step 
20 S16. If it is determined that \ is smaller than dmax (NO 
in step S16), the flow advances tckstep S19 . 

In step S19, d is compared with dmin. If d is 
equal to or smaller than dmin (YES in step S19) , the 
fl ow advances to step S20 to record the minimum value 
25 dmin in the pitch mark data file 101a. In step S21, dmin 
is subtracted from d, and the flow returns to step S19 . 
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If it is determined that d is larger than dmin (NO in 
step S19) , the flow advances to step S22 to record d. 
The processing is then terminated. 

With this recording, for example, dmin-1 (-128 in 
the above case) can be used as a voiced portion end 
signal . 

[Second Embodiment] 

^fQ^he se\ond embodiment, pitch mark data file 
loading processirig of loading data from the pitch mark 
10 data file 101a recorded in the first embodiment will be 
described with reference to Fig. 6. 



in / / 5i^v 6 a flow chart showing pitch mark data 

= fk •>'— ^ file loadinrr nrnrSpq.q-i nrr PvprnfpH \ -n M^d car-z^r^A 



(X> file loading processing executed in the second 

embodiment of the present invention. 
^? 15 ^i^s\^of all, in step S23, start information 



=5 *£)u3 indicating whether the start of speech data to be 

processed is a v^ice or unvoiced portion, is loaded from 
a pitch mark data f^le 101a. It is then checked in step 
S24 whether the loadeckstart information is voiced 
20 portion start informations. If voiced portion start 

information is determined (^ES in step S24) , the flow 
advances to step S25 to load a\first inter-pitch-mark 
distance (distance between a firat pitch mark p x and a 
second pitch mark p 2 of the voiced portion) d ± from the 
25 pitch mark data file 101a. Note that t^e second pitch 
mark p 2 is located at p 1 +d 1 . 
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^n^t.ep 3^6/ the value of a loop counter i is 
initialized to 2\ In step S27, a difference d r (data 
corresponding the Itength of one word) from the pitch 
mark data file 101a. \ln step S28, it is checked whether 
5 the loaded difference ckis a voiced portion end signal. 
If it is determined that \he difference is not a voiced 
portion end signal (NO in sttep S28), the flow advances 
to step S29 to calculate a next inter-pitch-mark 
distance d. and pitch mark position p i+1 from a pitch mark 
10 position p., inter-pitch-mark distance d^, and 
obtained in the past. 



—ie fox lowing equations can be formulated Ixoia p.,- 
d ±-i' d r' d i' and Pi + \ The next inter-pitch-mark distance 
d ± and pitch mark position p i+1 can be calculated by using 
15 these equations. 

d i = di-i + ^ \ ... (1) 

— Pi . i = Pi -^-^i "V ■ ^JL2i_ 

In step S3 0, the loop counter i is incremented by 
1. The flow then returns to step S27 . 
20 If it is determined that d r is a voiced portion end 

signal (YES in step S28) , the flow advances to step S31 
to check whether the speech data has ended. If it is 
determined that the speech data has not ended (NO in 
step S31) , the flow advances to step S32 . If it is 
25 determined that the speech data has ended (YES in step 
S31), the processing is terminated. 
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If it is determined in step S24 that the loaded 
information is not voiced portion start information (NO 
in step S24) , the flow advances to step S32 to load a 
distance d s to the next voiced portion from the pitch 
5 mark data file 101a. It is then checked in step S33 

whether the speech data has ended. If it is determined 
that the speech data has not ended (NO in step S33), the 
flow advances to step S25. If it is determined that the 
speech data has ended (YES in step S33), the processing 
10 is terminated. 
^ ^^^^scribedVbove, according to the second 

embodiment, since pi\ch marks can be loaded by using the 
pitch mark data file Ifila managed by the processing 
described in the first embodiment, the size of data to 
15 be processed decreases to \mprove the processing 
efficiency. 

Another example of the processing of loading pitch 
marks of each voiced portion in the second embodiment 
will be described with reference to Fig. 7. 
20 Fig. 7 is a flow chart showing another example of 

the processing of loading pitch marks of each voiced 
portion in the second embodiment of the present 
invention . 

s\rf? Assume that the data Vength information of loaded 

(y Qj^ 25 speech data is stored in a register d, and a maximum 

value dmax (e.g., 127), a minirHum value dmin (e.g, -127), 
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and a voiced portion end signal are defined for a given 
word length (e.g., 8. bits) in Fig. 5. 

^^i^is^: of all, in step S34, the register d is 
initialized ^ 0. In step S35, the data d r corresponding 
5 the length of ohe word is loaded from the pitch mark 

data file 101a. It\is then checked in step S3 6 whether 
is a voiced portionSend signal. If it is determined 
that the d r is a voiced pbrtion end signal (YES in step 
if 3 S3 6) , the processing is terminated. If it is determined 

10 that d r is not a voiced portion^nd signal (NO in step 



I S3 6) , the flow advances to step S37\to add d r to the 

J contents of the register d. 

In step S3 8, it is checked whether d r is equal to 
dmax or dmin. If it is determined that they are equal 
15 (YES in step S38), the flow returns to step S35. If it 
is determined that they are not equal (NO in step S3 8) , 
the processing is terminated. 
/^T& <^ /^^^ tl^t the present invention may be applied to 

either a system >SK>nstituted by a plurality of equipments 
20 (e.g., a host compu^tver, an interface device, a reader, a 
printer, and the like)\ of an apparatus consisting of a 
single equipment (e.g., aNqopying machine, a facsimile 
apparatus, or the like) 

The objects of the present invention are also 
25 achieved by supplying a storage medium, which records a 
program code of a software program that can realize the 
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functions of the above-mentioned embodiments to the 
system or apparatus, and reading out and executing the 
program code stored in the -storage medium by a computer 
(or a CPU or MPU) of the system or apparatus. 

In this case, the program code itself read out 
from the storage medium realizes the functions of the 
above-mentioned embodiments, and the storage medium 
which stores the program code constitutes the present 
invention. 

As the storage medium for supplying the program 
code, for example, a floppy disk, hard disk, optical 
disk, magneto-optical disk, CD-ROM, CD-R, magnetic tape, 
nonvolatile memory card, ROM, and the like may be used. 

The functions of the above-mentioned embodiments 
may be realized not only by executing the readout 
program code by the computer but also by some or all of 
actual processing operations executed by an OS 
(operating system) running on the computer on the basis 
of an instruction of the program code. 

Furthermore, the functions of the above-mentioned 
embodiments may be realized by some or all of actual 
processing operations executed by a CPU or the like 
arranged in a function extension board or a function 
extension unit, which is inserted in or connected to the 
computer, after the program code read out from the 
storage medium is written in a memory of the extension 
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board or unit. 

As many apparently widely different embodiments of 
the present invention can be made without departing from 
the spirit and scope thereof, it is to be understood 
5 that the invention is not limited to the specific 

embodiments thereof except as defined in the appended 
claims. 
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